NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PANDA: Query Evaluation in Submodular Width

https://doi.org/10.46298/THEORETICS.25.12

Khamis, Mahmoud Abo; Ngo, Hung Q; Suciu, Dan (April 2025, TheoretiCS)

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17]. Comment: 42 pages. This is the TheoretiCS journal version
more » « less
Free, publicly-accessible full text available April 30, 2026
Polynomial Time Convergence of the Iterative Evaluation of Datalogo Programs

https://doi.org/10.1145/3695839

Im, Sungjin; Moseley, Benjamin; Ngo, Hung Q; Pruhs, Kirk (November 2024, Proceedings of the ACM on Management of Data)

Datalog^ois an extension of Datalog that allows for aggregation and recursion over an arbitrary commutative semiring. Like Datalog, Datalogo programs can be evaluated via the natural iterative algorithm until a fixed point is reached. However unlike Datalog, the natural iterative evaluation of some Datalogo programs over some semirings may not converge. It is known that the commutative semirings for which the iterative evaluation of Datalogo programs is guaranteed to converge are exactly those semirings that are stable. Previously, the best known upper bound on the number of iterations until convergence over p-stable semirings is ∑i=1 ^n (p+2)ⁱ= Θ(pⁿ) steps, where n is (essentially) the output size. We establish that, in fact, the natural iterative evaluation of a Datalogo program over a p-stable semiring converges within a polynomial number of iterations. In particular our upper bound is O(σ p n²( n²lg Λ + lg σ)) where σ is the number of elements in the semiring present in either the input databases or the Datalogo program, and λ is the maximum number of terms in any product in the Datalogo program.
more » « less
Full Text Available
Convergence of datalog over (Pre-) Semirings

https://doi.org/10.1145/3643027

Abo_Khamis, Mahmoud; Ngo, Hung Q; Pichler, Reinhard; Suciu, Dan; Wang, Yisu Remy (April 2024, Journal of the ACM)

Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.
more » « less
Full Text Available
Datalog in Wonderland

https://doi.org/10.1145/3552490.3552492

Khamis, Mahmoud Abo; Ngo, Hung Q.; Pichler, Reinhard; Suciu, Dan; Remy Wang, Yisu (July 2022, ACM SIGMOD Record)

Modern data analytics applications, such as knowledge graph reasoning and machine learning, typically involve recursion through aggregation. Such computations pose great challenges to both system builders and theoreticians: first, to derive simple yet powerful abstractions for these computations; second, to define and study the semantics for the abstractions; third, to devise optimization techniques for these computations. In recent work we presented a generalization of Datalog called Datalog, which addresses these challenges. Datalog is a simple abstraction, which allows aggregates to be interleaved with recursion, and retains much of the simplicity and elegance of Datalog. We define its formal semantics based on an algebraic structure called Partially Ordered Pre-Semirings, and illustrate through several examples how Datalog can be used for a variety of applications. Finally, we describe a new optimization rule for Datalog, called the FGH-rule, then illustrate the FGH-rule on several examples, including a simple magic-set rewriting, generalized semi-naïve evaluation, and a bill-of-material example, and briefly discuss the implementation of the FGH-rule and present some experimental validation of its effectiveness.
more » « less
Full Text Available
Convergence of Datalog over (Pre-) Semirings

https://doi.org/10.1145/3517804.3524140

Abo Khamis, Mahmoud; Ngo, Hung Q.; Pichler, Reinhard; Suciu, Dan; Wang, Yisu Remy (June 2022, PODS)

Full Text Available
Optimizing Recursive Queries with Progam Synthesis

https://doi.org/10.1145/3514221.3517827

Wang, Yisu Remy; Abo Khamis, Mahmoud; Ngo, Hung Q.; Pichler, Reinhard; Suciu, Dan (June 2022, SIGMOD)

Full Text Available
Bag Query Containment and Information Theory

https://doi.org/10.1145/3472391

Khamis, Mahmoud Abo; Kolaitis, Phokion G.; Ngo, Hung Q.; Suciu, Dan (September 2021, ACM Transactions on Database Systems)

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential-time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.
more » « less
Full Text Available
Bag Query Containment and Information Theory

https://doi.org/10.1145/3375395.3387645

Abo Khamis, Mahmoud; Kolaitis, Phokion G.; Ngo, Hung Q.; Suciu, Dan (June 2020, PODS)

The query containment problem is a fundamental algorithmic prob- lem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of infor- mation inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.
more » « less
Full Text Available
Decision Problems in Information Theory

https://doi.org/10.4230/LIPIcs.ICALP.2020.106

Abo Khamis, Mahmoud; Kolaitis, Phokion G; Ngo, Hung Q; Suciu, Dan (January 2020, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020))

Full Text Available
On Functional Aggregate Queries with Additive Inequalities

https://doi.org/10.1145/3294052.3319694

Abo Khamis, Mahmoud; Curtin, Ryan R.; Moseley, Benjamin; Ngo, Hung Q.; Nguyen, XuanLong; Olteanu, Dan; Schleich, Maximilian (January 2019, Symposium on Principles of Database Systems)

Full Text Available

Search for: All records